Distant Supervision for Tweet Classification Using YouTube Labels

نویسندگان

  • Walid Magdy
  • Hassan Sajjad
  • Tarek El-Ganainy
  • Fabrizio Sebastiani
چکیده

We study an approach to tweet classification based on distant supervision, whereby we automatically transfer labels from one social medium to another. In particular, we apply classes assigned to YouTube videos to tweets linking to these videos. This provides for free a virtually unlimited number of labelled instances that can be used as training data. The experiments we have run show that a tweet classifier trained via these automatically labelled data substantially outperforms an analogous classifier trained with a limited amount of manu-

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Simple Queries as Distant Labels for Predicting Gender on Twitter

The majority of research on extracting missing user attributes from social media profiles use costly hand-annotated labels for supervised learning. Distantly supervised methods exist, although these generally rely on knowledge gathered using external sources. This paper demonstrates the effectiveness of gathering distant labels for self-reported gender on Twitter using simple queries. We confir...

متن کامل

Sentiment Analysis using Deep Convolutional Neural Networks with Distant Supervision

This thesis addresses the problem of predicting message-level sentiments of English micro-blog messages from Twitter. Convolutional neural networks (CNN) have shown great promise in the task of sentiment classification. Here we expand the CNN proposed by [31, 32] and perform an in-depth analysis to deepen the understanding of these systems. In a first step we compare the performance of differen...

متن کامل

Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm

NLP tasks are often limited by scarcity of manually annotated data. In social media sentiment analysis and related tasks, researchers have therefore used binarized emoticons and specific hashtags as forms of distant supervision. Our paper shows that by extending the distant supervision to a more diverse set of noisy labels, the models can learn richer representations. Through emoji prediction o...

متن کامل

Reducing Wrong Labels in Distant Supervision for Relation Extraction

In relation extraction, distant supervision seeks to extract relations between entities from text by using a knowledge base, such as Freebase, as a source of supervision. When a sentence and a knowledge base refer to the same entity pair, this approach heuristically labels the sentence with the corresponding relation in the knowledge base. However, this heuristic can fail with the result that s...

متن کامل

An Effective Way to Improve YouTube-8M Classification Accuracy in Google Cloud Platform

Large-scale datasets have played a significant role in progress of neural network and deep learning areas. YouTube-8M is such a benchmark dataset for general multilabel video classification. It was created from over 7 million YouTube videos (450,000 hours of video) and includes video labels from a vocabulary of 4716 classes (3.4 labels/video on average). It also comes with pre-extracted audio &...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015